Search results for "Supervised classification"
showing 7 items of 7 documents
Application of the Information Bottleneck method to discover user profiles in a Web store
2018
The paper deals with the problem of discovering groups of Web users with similar behavioral patterns on an e-commerce site. We introduce a novel approach to the unsupervised classification of user sessions, based on session attributes related to the user click-stream behavior, to gain insight into characteristics of various user profiles. The approach uses the agglomerative Information Bottleneck (IB) algorithm. Based on log data for a real online store, efficiency of the approach in terms of its ability to differentiate between buying and non-buying sessions was validated, indicating some possible practical applications of the our method. Experiments performed for a number of session sampl…
Time series clustering with different distance measures to tell Web bots and humans apart
2022
The paper deals with the problem of differentiating Web sessions of bots and human users by observing some characteristics of their traffic at the Web server input. We propose an approach to cluster bots’ and humans’ sessions represented as time series. First, sessions are expressed as sequences of HTTP requests coming to the server at specific timestamps; then, they are pre-preprocessed to form time series of limited length. Time series are clustered and the clustering performance is evaluated in terms of the ability to partition bots and humans into separate clusters. The proposed approach is applied to real server log data and validated with the use of different time series distance meas…
A bibliometric approach to finding fields that co-evolved with information technology
2020
Among the declining industries, for example music industry, some have been revived by information technology (IT). At the same time, in academic fields, some have expected co-evolutions between IT and other fields to cause the resurgence of either field. In this research, the clustering of citation networks with 14,438 academic papers resulted in the identification of 28 academic fields in the areas “Computer Science” or “Information Science and Library Science.” Co-evolutions between these 28 fields and citing fields to the 28 fields were evaluated by an investigation of contents; a methodology to search co-evolutions was also proposed. This paper proposes that pairs of academic fields (wi…
Desafíos y oportunidades de Sentinel-2 en la monitorización de las aguas continentales
2023
En los ecosistemas de agua dulce, la escasez y la contaminación de este recurso está promoviendo que los organismos gubernamentales incluyan en sus agendas estrategias para mitigar esta situación a través de una gestión sostenible. La Directiva Marco del Agua establece entre sus requerimientos la monitorización del estado ecológico de las aguas continentales para determinar su calidad. Las imágenes satelitales ofrecen una visión sinóptica y continua a partir de la que es posible derivar métricas del estado ecológico. Esas métricas son un complemento a los tradicionales muestreos ya que se incrementa la cobertura espacial y la periodicidad en la monitorización. Sentinel-2, con su sensor Mult…
Bot recognition in a Web store: An approach based on unsupervised learning
2020
Abstract Web traffic on e-business sites is increasingly dominated by artificial agents (Web bots) which pose a threat to the website security, privacy, and performance. To develop efficient bot detection methods and discover reliable e-customer behavioural patterns, the accurate separation of traffic generated by legitimate users and Web bots is necessary. This paper proposes a machine learning solution to the problem of bot and human session classification, with a specific application to e-commerce. The approach studied in this work explores the use of unsupervised learning (k-means and Graded Possibilistic c-Means), followed by supervised labelling of clusters, a generative learning stra…
Supervised Classifications of Optical Water Types in Spanish Inland Waters
2022
Remote sensing of lake water quality assumes there is no universal method or algorithm that can be applied in a general way on all inland waters, which usually have different in-water components affecting their optical properties. Depending on the place and time of year, the lake dynamics, and the particular components of the water, non-tailor-designed algorithms can lead to large errors or lags in the quantification of the water quality parameters, such as the suspended mineral sediments, dissolved organic matter, and chlorophyll-a concentration. Selecting the most suitable algorithm for each type of water is not a simple matter. One way to make selecting the most suitable water quality al…
Bacteria classification using minimal absent words
2017
Bacteria classification has been deeply investigated with different tools for many purposes, such as early diagnosis, metagenomics, phylogenetics. Classification methods based on ribosomal DNA sequences are considered a reference in this area. We present a new classificatier for bacteria species based on a dissimilarity measure of purely combinatorial nature. This measure is based on the notion of Minimal Absent Words, a combinatorial definition that recently found applications in bioinformatics. We can therefore incorporate this measure into a probabilistic neural network in order to classify bacteria species. Our approach is motivated by the fact that there is a vast literature on the com…